How to make your scientific code accessible and reproducible?

Using Git and Github

Steve Vissault

inSileco

April 1, 2025

Who am I?

Who am I?

  • Steve Vissault, M.Sc in forest ecology
    • How climate changes will shape forest distributions at the end of this century
  • Research professional at UdS for 3 years
  • Then I did software developement for Omnimed, an Electronic Medical Records (EMR) for 3 years
  • Started a consulting company inSileco 4 years ago with 2 amazing ecologists
  • Now, taking a position at INRS with Valérie Langlois for the next 2 years

What is Git?

What is Git?

  • Git is a version control system: a software that allows you to track additions and modifications for a set of files within a folder, called a repository
  • Its original purpose was to help groups of developers work collaboratively on big software projects.
  • Think git as the “Track Changes” features from Microsoft Word on steroids.

Git - Context

Compiled software are a bunch of files. At each software improvment, developers want the software to be stable with no regression. For that, they have to know what files have changed between snapshots.

Why is this relevant to a scientist?

Why is this relevant to a scientist?

Why is this relevant to a scientist?

Why is this relevant to a scientist?

Git fondations

Git fondations: add & commit

Git fondations: add & commit

Git fondations: add & commit

Git fondations: add & commit

Git fondations: add & commit

Git fondations: add & commit

Git fondations: branches

Git fondations: branches

  • Une branche (main par défault): c’est un série de commentaires (commit)
  • Le dernier commentaire (commit) est ce que l’on appelle la tête de la branche (HEAD), elle contient la version la plus à jour des fichiers.
  • À chaque commentaire d’édition (commit) est attachée une version des fichiers.

Git fondations: local vs remote

How git helps in science reproducibility

  • Git is a coding notebook: it documents not only the code but also the coding process
  • It makes your code findable and reusable by publishing on a remote repository Github
  • All the code source, data and figs are accessible from one location

In summary

  1. Initiate the repository (git init)
  2. Edit your files
  3. Stage the modifications to be committed (git add)
  4. Create a new commit object (git commit)
  5. Go back to step 2.
  6. Publish your code on the remote server (git fetch & git push)
  7. Go back from vacation
  8. Watch if another developer has release new version of the code on the remote repository (git fetch & git pull)

How to use Git in practice?

Prerequisites

  1. Make sure R version ≥ 4.0 is installed on your computer
  2. Install Git:
  • Windows, Install Git for Windows (also known as msysgit or Git Bash) to get Git and tools like the Bash shell. Download the executable here
  • macOS, open a terminal and type:
    • git —version
    • git config
    • If Git isn’t installed, macOS will prompt you to install it.
  • Linux, nothing to do—Git is likely pre-installed.

For more details: https://happygitwithr.com/install-git

What we’re going to do

  1. Create a folder
  2. Initiate a git repo for this new folder
  3. Create a new R script (file 1) and edit it
  4. Stage this new R script in the git repo
  5. Add a commit to document the added file
  6. Create a second R script (file 2) in the git repo and edit (file 3)
  7. Repeat steps 4 and 5 for this second script
  8. Check the git log
  9. Get back to the first commit

Initiate a Git repo

In order to reduce the learning curve, we will learn R Studio of interacting with git

So many other ways to interact with git:

  • Using R with the gert package
  • Using the terminal
  • Using as GUI such as Github Desktop

All the key concepts and process still the same!

Initiate a Git repo

To use git in Rstudio, you need to create a new Project. Let’s start by creating our project directory.

Open RStudio, in the top right corner create a new project

  • Select New Directory Note that Version Control is reserved for when you already created a repo on GitHub.
  • Select New Projects
  • Choose a name for your repo and a location. Make sure that create a git repository is checked

Initiate a Git repo

You can now look under the git tab, you should see this:

Create a new R script

Create a new file: go to the menu File => New file => R Script. Make sure to save it with the name script_1.R.

What happens in the git tab?

Understand file status

  • Files that are untracked are represented by a yellow question mark.
  • Files that have been added (see next section) are represented by a green A
  • Files that have been tracked and modified are represented by a blue M.
  • Files that are tracked but not modified do not show.
  • Files that have been deleted are shown with a red D.

Stage a new R script

This is as simple as “checking off” the file in the Git tab. This is called Staging the file.

What is a gitignore? A gitignore is a file that lists the file you never want git to track. It can match certain file names (for instance, .csv or .tif files). This can be useful in case you need to make sure certain files (like data files or large files), do not get added.

Create a commit

Click on commit on the top of the file list. This window should appear:

Before committing anything you need to add a commit message. It is important to add a useful message to your commit, a bit like a journal entry, so that you can remember what you committed.

Click on commit in order to commit the changes!

Edit your script

Note that once you have staged a file, you could do more changes, and you would need to re-run git add to add them to the index. Those changes not yet fully registered by git, they are like a draft, not until you commit. When you want to take a snapshot of a file, it means you are ready to commit that change to the index.